home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The PC-SIG Library 10
/
The PC-Sig Library - Shareware for the IBM PC and Compatibles (PC-SIG)(Tenth Edition Disks 1-2804)(1991).iso
/
PC_SIGCD
/
22
/
4
/
DISK2247.ZIP
/
CBASE101.ZIP
/
GUIDE.TXT
< prev
next >
Wrap
Text File
|
1990-06-21
|
57KB
|
1,538 lines
cbaseTM
The C Database Library
Citadel
Brookville, IndianaCopyright 1989 by Citadel. All rights reserved.
Citadel
241 East Eleventh Street
Brookville, IN 47012
317-647-4720
BBS 317-647-2403
Version 1.0.1
This manual is protected by United States copyright law. No part of it
may be reproduced without the express written permission of Citadel.
Technical Support
The Citadel BBS is available 24 hours a day. Voice support is
available between 10 a.m. and 4 p.m. EST. When calling for technical
support, please have ready the following information:
- product name and version number
- operating system and version number
- C compiler and version number
- computer brand and model
UNIX is a trademark of AT&T. MS-DOS is a trademark of Microsoft
Corporation. Turbo C is a trademark of Borland International, Inc.
Contents
Chapter 1. Introduction 1
Chapter 2. Database Basics 3
2.1 Sequential File Structures
2.2 Inverted Files
2.3 B-trees
Chapter 3. cbase Library Functions 7
3.1 Access Control Functions
3.2 Lock Functions
3.3 Record Cursor Position Functions
3.4 Key Cursor Position Functions
3.5 Input/Output Functions
3.6 Data Import/Export Functions
Chapter 4. An Example Program 15
4.1 Data Definition
4.2 Opening a cbase
4.3 Locking a cbase
4.4 Accessing a cbase
4.5 Closing a cbase
4.6 Manipulating Exported Data
Appendix A. Installation Instructions 27
A1 The blkio Library
A2 The btree Library
A3 The lseq Library
A4 The cbase library
Appendix B. Defining New Data Types 33
B1 The Type Name
B2 The Comparison Function
B3 The Export and Import Functions
B4 The Type Count
Appendix C. Porting to a New Operating System 37
C1 The HOST macro
C2 The File Descriptor Type
C3 File Access System Calls
C4 File Locking System Calls
References 41
Chapter 1: Introduction
cbase is a C database file management library. Records may be
accessed both randomly and sequentially through indexes stored in
B+-trees. Records may also be accessed sequentially in the order in
which they are stored. Multiuser access is supported under any
operating system with file locking capabilities.
cbase is designed for portability. It is written in strict adherence
to the ANSI C standard while retaining compatibility with with K&R C
compilers. All system dependent code is isolated in order to make
porting easy.
Many of the operations performed by cbase internally represent
independently useful tools, and the software has been designed with this
in mind. cbase actually comprises four individual libraries, each
complete and independently accessible. Figure 1.1 shows these libraries
and their relationships.
┌─────────────────────────────────┐
│ cbase │
└───────┬─────────────────┬───────┘
┌───────┴───────┐ ┌───────┴───────┐
│ lseq │ │ btree │
└───────┬───────┘ └───────┬───────┘
┌───────┴─────────────────┴───────┐
│ blkio │
└─────────────────────────────────┘
Figure 1.1. cbase and Underlying Libraries
At the foundation of cbase is the blkio (block buffered I/O) library.
blkio is a buffered I/O library similar to stdio, but based on a file
model more appropriate for structured files such as used in database
software. While stdio models a file as an unstructured stream of
characters, blkio models a file as a collection of blocks made up of
fields (see FROS89 for a complete description of blkio).
The lseq (linked sequential file) library provides all the facilities
necessary for the creation and manipulation of doubly linked sequential
files. The btree (B-tree) library provides the same for B+-tree files.
The cbase library uses lseq and btree to perform all structured file
management operations. The lseq library is used for record storage and
the btree library for inverted file key storage.
When using a particular library, all operations are performed with
functions provided by that library. No references need be made to
underlying libraries. When using the cbase library, it is therefore not
necessary to know the functions included in the other libraries.
Chapter 2: Database Basics
This chapter describes some of the basic database concepts relevant
to cbase. It is intended to be a brief and readily accessible
introduction, and can be easily skipped by one already familiar with
database fundamentals. References are given where in-depth discussions
may be found.
2.1 Sequential File Structures
One of the simplest file organizations is the physical sequential
file. In this organization, records are simply written one after the other
and sequential access is done in the order in which the records are
physically stored.
The physical sequential file works very well in cases where the
data is static, but it is extremely inefficient when the data is dynamic.
Consider a file containing 100,000 records stored in a sorted order.
The insertion of a single record at the beginning of this file would
result in 100,000 records being moved to make room at the beginning
of the file for the new record.
This identical problem also occurs when large ordered lists are
stored in memory, and the same solution, the linked list, can be used
for files (see pp. 106-12 of HORO76 for an explanation of linked lists
in memory). In a linked list, the record order is determined by pointers
stored with each record, not by the physical record locations; a record
logically located at the beginning of a file can be physically located
anywhere within the file. In a singly linked list each record has only a
pointer to the next record and so the file can be traversed in just one
direction. In a doubly linked list, each record has both a next and a
previous pointer, allowing bidirectional access.
2.2 Inverted Files
Assume a data file containing member records for an organization,
and that the record format for this file is
typedef struct {
long number;
char name[24];
char address[81];
char city[24];
char state[3];
char zip[11];
} member_t;
Assume further that the data file has a physical sequential file structure,
and that the records are sorted by the number field, therefore a binary
search may be performed to quickly find a record with a given number
field. The field that the records are sorted on is called the primary
index. The problem arises when a query is made on another field
besides the primary index, in which case a search of the entire data file
is required.
The problem of being able to perform queries efficiently on more
than one field can be resolved through the use of inverted files. For
each secondary index exists an inverted file containing an entry for each
record in the data file. The inverted file for the name field, for
example, would have entries containing the name field and record
position, and these would be sorted by the name field. To find a
record with a given value in the name field, the inverted file for the
name field would be searched. The record position paired with the
specified name value would then be used to locate the record in the
data file. See pp. 531-533 of HORO76 and pp. 75-78 of ULLM82 for
more information on inverted files.
2.3 B-trees
As discussed above, the physical sequential file provides
unacceptable performance for dynamic data. For static data, however, it
provides efficient sequential and passable random access (using a binary
search). The linked sequential file solves the problem of performance
for dynamic data while preserving efficient sequential access, but the
ability to randomly access records is lost. The need to store dynamic
data which may be randomly accessed led to the development of the
B-tree.
As the name implies, the B-tree stores data in a tree structure,
which allows random access. And since the tree nodes are connected
by pointers, the update problem is solved in the same way as by the
linked sequential file organization. The basic B-tree does not provide
very efficient sequential access, however. This led to a B-tree variant
called the B+-tree. The B+-tree incorporates a linked list into its
structure to achieve efficient sequential as well as random access.
While the B+-tree does come close to providing the best of all
worlds, it has two important drawbacks. First, there is significantly
more storage overhead for a B+-tree than for a sequential file. Every
entry in a B+-tree is stored in a leaf node, and the rest of the tree is
simply scaffolding used when the tree is searched. Second, each entry
in any type of B-tree must be unique. These characteristics make
B-tree file organizations inappropriate for data files containing records,
which may be both large and duplicated. But for inverted files,
generally having relatively short entries that are by their very nature
unique (because the file position of each record in the data file would
be unique), the B+-tree is an ideal choice. cbase therefore uses the
linked sequential file organization for record storage and B+-trees for
inverted files.
The mechanics of B-trees is beyond the scope of this chapter. A
good discussion of the B-tree and its major variants may be found in
COME79.
Chapter 3: cbase Library Functions
3.1 Access Control Functions
The cbcreate function is used to create a new cbase.
int cbcreate(const char *cbname, size_t recsize,
int fldc, const cbfield_t fldv[]);
cbname points to a character string which is the name of the cbase.
This name is used as the name of the data file containing the records in
the cbase. recsize specifies the record size to be used. fldc is the
number of fields in the cbase, and fldv is an array of fldc field
definition structures. Each structure in the array contains the definition
for one field. field definitions must be in the order the fields occur in
the record. The field definition structure type cbfield_t is defined
in <cbase.h>.
typedef struct { /* field definition */
size_t offset; /* field offset */
size_t size; /* size of field */
int type; /* type of field */
int flags; /* flags */
char filename[FILENAME_MAX + 1];
/* data file name */
} cbfield_t;
offset is the location of the field within the record and size is the
size of the field. type specifies the field data type, legal values for
which are shown in Table 3.1. flags values are constructed by
bitwise ORing together flags from the following list.
CB_FKEY Field is to be a key.
CB_FUNIQ Only for use with CB_FKEY.
Indicates that the key is
constrained to be unique.
FILENAME_MAX is an ANSI C definition indicating the maximum
length of a filename string. It is defined in <stdio.h>.
t_char signed character
t_charv signed character array
t_uchar unsigned character
t_ucharv unsigned character array
t_short signed short integer
t_shortv signed short integer array
t_ushort unsigned short integer
t_ushortv unsigned short integer array
t_int signed integer
t_intv signed integer array
t_uint unsigned integer
t_uintv unsigned integer array
t_long signed long integer
t_longv signed long integer array
t_ulong unsigned long integer
t_ulongv unsigned long integer array
t_float floating point
t_floatv floating point array
t_double double precision
t_doublev double precision array
t_ldouble long double
t_ldoublev long double array
t_pointer pointer
t_string character string
t_cistring case-insensitive character string
t_binary block of binary data (e.g., graphics)
Table 3.1. cbase Data Types
The first step in accessing an existing cbase is to open it, which is
done using the function
cbase_t *cbopen(const char *cbname, const char
*type, int fldc, const cbfield_t fldv[]);
cbname, fldc, and fldv are the same as for cbcreate, and must
be given the same values as when the cbase was created. type points
to a character string specifying the type of access for which the cbase
is to be opened (as for the stdio function fopen). Legal values for
type are
"r" open for reading
"r+" open for update (reading and writing)
cbopen returns a pointer to the open cbase.
The cbsync function causes any buffered data for a cbase to be
written out.
int cbsync(cbase_t *cbp);
The cbase remains open and the buffers retain their contents.
After processing is completed on an open cbase, it must be closed
using the function
int cbclose(cbase_t *cbp);
The cbclose function causes any buffered data for the cbase to be
written out, unlocks it, closes it, and frees the cbase pointer.
3.2 Lock Functions
Before an open cbase can be accessed, it must be locked. The
function used to control the lock status of a cbase is
int cblock(cbase_t *cbp, int ltype);
where cbp is a pointer to an open cbase and ltype is the lock type
to be placed on the cbase. The legal values for ltype are
CB_RDLCK lock cbase for reading
CB_WRLCK lock cbase for reading and writing
CB_RDLKW lock cbase for reading (wait)
CB_WRLKW lock cbase for reading and writing (wait)
CB_UNLCK unlock cbase
If ltype is CB_RDLCK and the cbase is currently write locked by
another process, or if ltype is CB_WRLCK and the cbase is currently
read or write locked by another process, cblock will fail and set
errno to EAGAIN. For the wait lock types, cblock will not return
until the lock is available.
The cbgetlck function reports the lock status held by the calling
process on a cbase.
int cbgetlck(cbase_t *cbp);
It returns one of the legal values for the ltype argument in the
cblock function.
3.3 Record Cursor Position Functions
Each open cbase has a record cursor. At any given time the
record cursor is positioned either on a record in that cbase or on a
special position called null. The record on which the cursor is located
is referred to as the current record. The operations performed by most
cbase functions are either on or relative to the current record, so the
initial step in a transaction on a cbase is usually to position the record
cursor on the desired record. When accessing the records in a cbase in
the order that they are stored, the following functions are used to move
the record cursor.
int cbrecfirst(cbase_t *cbp);
int cbreclast(cbase_t *cbp);
int cbrecnext(cbase_t *cbp);
int cbrecprev(cbase_t *cbp);
The cbrecfirst function positions the record cursor to the first
record, and cbreclast to the last record. Before calling either of
these functions cbreccnt should be used to test if the cbase is empty.
unsigned long cbreccnt(cbase_t *cbp);
If the cbase is empty, there is no first or last record and so these
functions would return an error. The cbrecnext function advances
the record cursor to the succeeding record, and cbrecprev retreats it
to the preceding record. In the record ordering, null is located before
the first record and after the last.
There are also functions for saving the current position of the
record cursor and resetting it to that position.
int cbgetrcur(cbase_t *cbp, cbrpos_t *cbrposp);
int cbsetrcur(cbase_t *cbp, const
cbrpos_t*cbrposp);
The cbgetrcur function gets the current position of the record cursor
and saves it in the variable pointed to by cbrposp. cbrpos_t is the
cbase record position type, defined in <cbase.h>. cbsetrcur can
then be used later to set the record cursor back to that position. The
record cursor can be positioned on null by passing cbsetrcur the
NULL pointer rather than a pointer to a variable. Other than this
special case, cbsetrcur should only be called with record cursor
positions previously saved with cbgetrcur.
The cbrcursor macro is used to test if the record cursor for a
cbase is positioned on a record or on null.
void *cbrcursor(cbase_t *cbp);
If the record cursor of the cbase pointed to by cbp is positioned on
null, cbrcursor returns the NULL pointer. If it is on a record,
cbrcursor returns a value not equal to the NULL pointer. This
function is useful for loops needing to test when the last (or first)
record has been reached.
The cbrecalign function aligns the record cursor with a
specified key cursor.
int cbrecalign(cbase_t *cbp, int field);
field is the key with which to align the record cursor. The
relationship between the key cursors and the record cursor is explained
in the next section.
3.4 Key Cursor Position Functions
In addition to a record cursor, each open cbase also has a key
cursor for each key defined for that cbase. Like the record cursor, a
key cursor is positioned either on a record in that cbase or on null. To
access a cbase in the sort order of a certain key, the appropriate key
cursor is used instead of the record cursor. Each key cursor moves
independently of the others, but whenever a key cursor position is set,
the record cursor is moved to the same record. The key cursors are
not affected by moving the record cursor.
The following functions are used to move a key cursor.
int cbkeyfirst(cbase_t *cbp, int field);
int cbkeylast(cbase_t *cbp, int field);
int cbkeynext(cbase_t *cbp, int field);
int cbkeyprev(cbase_t *cbp, int field);
These perform as do the corresponding functions for the record cursor.
Note that the key cursor functions can be used only with fields defined
to be keys (see cbcreate in section 3.1).
The following function is used to search for a key of a certain
value.
int cbkeysrch(cbase_t *cbp, int field,
const void *buf);
field is the key to search for the data item pointed to by buf. If
the key is found, 1 is returned and the key and record cursors
positioned to the record having that key. If there is no record with that
key, 0 is returned and the key and record cursor positioned to the
record (possibly null) that would follow a record with that key value.
Since the key cursors do not automatically follow the record
cursor, the situation sometimes occurs where the record cursor is
positioned to the desired record, but the cursor for the key to be used
next is not. The cbkeyalign function is used to align a specified
key cursor with the record cursor.
int cbkeyalign(cbase_t *cbp, int field);
The reason the key cursors are not updated every time the record cursor
moves is not because it would be in any way difficult to do so, but
because this would increase the overhead enormously. And since only
one key cursor is normally used at a time, this extra overhead would
almost never provide any benefit in return.
As for the record cursor, each key cursor position can be tested to
be positioned on a record or on null.
void *cbkcursor(cbase_t *cbp, int field);
If the key cursor specified by field of the cbase pointed to by cbp is
positioned on null, cbkcursor returns the NULL pointer. If it is on a
record, cbkcursor returns a value not equal to the NULL pointer.
3.5 Input/Output Functions
To read a record from a cbase, the record cursor for that cbase is
first positioned to the desired record using either the record cursor
position functions or the key cursor position functions. One of the
following functions is then called to read from the current record.
int cbgetr(cbase_t *cbp, void *buf);
int cbgetrf(cbase_t *cbp, int field, void *buf);
cbp is a pointer to an open cbase and buf points to the storage area
to receive the data read from the cbase. The cbgetr function reads
the entire current record, while cbgetrf reads the specified field
from the current record.
The function for inserting a new record into a cbase is
int cbinsert(cbase_t *cbp, const void *buf);
where buf points to the record to be inserted. When a new record is
inserted into a cbase, the position it holds relative to each key cursor is
defined by the sort order for that key field. There is no predefined sort
order associated with the record cursor, however, and it is up to the
user whether or not to store the records for each cbase in a sorted or
unsorted order. To store records in a sorted order, the record cursor is
first positioned the the record after which to insert the new record.
cbinsert is then called to insert the record pointed to by buf after
the current record. If no sort order is desired, the step to position the
record cursor is skipped.
The cbdelcur function is used to delete a record.
int cbdelcur(cbase_t *cbp);
The record cursor must first be positioned on the record to delete, then
cbdelcur called to delete the current record.
The cbputr function writes over an existing record.
int cbputr(cbase_t *cbp, const void *buf);
buf points to the new record contents. Writing over an existing record
is equivalent to deleting the record and inserting a new one in the same
position in the file. If the new record contains an illegal duplicate key,
this will cause the insert to fail, resulting in the record having been
deleted from the cbase. The exact behaviour that a program should
have in such a circumstance is different for different applicatons, and so
it is often desirable to use cbdelcur and cbinsert directly rather
than cbputr.
3.6 Data Import/Export Functions
cbase data can be exported to a text file using the cbexport
function.
int cbexport(cbase_t *cbp, const char *filename);
Every record in cbase cbp is converted to a text format and written to
the file filename. The export file format is defined as follows.
- Each record is terminated by a newline ('\n').
- The fields in a record are delimited by vertical
bars ('|').
- Each field contains only printable characters.
- If a field contains the field delimiter
character, that character is replaced with \F.
- The individual elements of array data types are
exported as individual fields.
Data may be imported from a text file using the cbimport
function.
int cbimport(cbase_t *cbp, const char *filename);
cbimport reads each record from the text file filename and inserts
it into the cbase cbp.
Data import/export is primarily used to move data between
different database formats. This sometimes requires some slight
rearranging of the text before importing. This can be most easily done
using awk, a language designed specifically for manipulating records
stored in text files. awk is normally included with UNIX, and is also
available for MS-DOS.
Chapter 4: An Example Program
Included with cbase is rolodeck, an example program illustrating
the use of cbase. Rolodeck is a program for storing business cards.
To allow it to be compiled without requiring any additional libraries for
displays, and because the purpose of the program is purely instructional,
the program has been given only a simple user interface.
4.1 Data Definition
The first step in writing a program using cbase is to define the
data to be stored. This should be done in a separate header file for
each record type to be used. Figure 4.1 shows rolodeck.h, the data
definition header for the business card record type used by the rolodeck
program.
Starting at the top, the first thing to note is the definition of a
macro to prevent multiple includes. This is a general practice
applicable to any header file, cbase or otherwise, whose purpose is to
allow a header to be specified for inclusion multiple times in a file
while ensuring that the definitions within the file are processed only
once. For a header containing nothing but macro definitions, the only
effect of preventing multiple includes will be reduced compile times.
But for a header containing, for instance, type definitions, multiple
includes would produce compilation errors. Notice how the include
macro is related to the header file name; the file name is converted to
upper case and the period replaced by an underscore ('.' is not a valid
character for C identifiers). These macros should be named consistently
to minimize the possibility of accidentally defining the same macro
elsewhere for some other purpose.
At the beginning of every data definition header file, <cbase.h>
should be included. Following this is defined the name of the cbase.
This character string is the name used for the data file. This macro is
to be used for the cbname argument when calling cbcreate and
cbopen.#ifndef ROLODECK_H /* prevent multiple includes */
#define ROLODECK_H
#include <cbase.h>
/* cbase name */
#define ROLODECK ("rolodeck.dat")
/* record definition */
typedef struct {
char rd_contact[41]; /* contact name */
char rd_title[41]; /* contact title */
char rd_company[41]; /* company name */
char rd_addr[2*40+1]; /* company address */
char rd_city[26]; /* city */
char rd_state[3]; /* state */
char rd_zip[11]; /* zip code */
char rd_phone[13]; /* phone number */
char rd_ext[5]; /* phone extension */
char rd_fax[13]; /* fax number */
char rd_notes[4*40+1]; /* notes */
} rolodeck_t;
/* field names */
#define RD_CONTACT (0)
#define RD_TITLE (1)
#define RD_COMPANY (2)
#define RD_ADDR (3)
#define RD_CITY (4)
#define RD_STATE (5)
#define RD_ZIP (6)
#define RD_PHONE (7)
#define RD_EXT (8)
#define RD_FAX (9)
#define RD_NOTES (10)
#define RDFLDC (11)
/* field definition list */
extern const cbfield_t rdfldv[RDFLDC];
#endif /* #ifndef ROLODECK_H */
Figure 4.1. rolodeck.h
#ifndef ROLODECK_I /* prevent multiple includes */
#define ROLODECK_I
#include <cbase.h>
#include <stddef.h>
#include "rolodeck.h"
/* field definition list */
static cbfield_t rd_fldv[] = {
{offsetof(rolodeck_t, rd_contact),
sizeofm(rolodeck_t, rd_contact),
t_string, CB_FKEY | CB_FUNIQ, "rdcont.ndx"},
{offsetof(rolodeck_t, rd_title),
sizeofm(rolodeck_t, rd_title),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_company),
sizeofm(rolodeck_t, rd_company),
t_string, CB_FKEY, "rdcomp.ndx"},
{offsetof(rolodeck_t, rd_addr),
sizeofm(rolodeck_t, rd_addr),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_city),
sizeofm(rolodeck_t, rd_city),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_state),
sizeofm(rolodeck_t, rd_state),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_zip),
sizeofm(rolodeck_t, rd_zip),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_phone),
sizeofm(rolodeck_t, rd_phone),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_ext),
sizeofm(rolodeck_t, rd_ext),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_fax),
sizeofm(rolodeck_t, rd_fax),
t_string, 0, ""},
{offsetof(rolodeck_t, rd_notes),
sizeofm(rolodeck_t, rd_notes),
t_string, 0, ""}
};
#endif /* #ifdef ROLODECK_I */
Figure 4.2. rolodeck.i
Next is the type definition for the record being defined. Each
field in the record (member in the structure) begins with a character
sequence derived from the cbase name, usually two characters, followed
by an underscore. This character sequence should be unique across all
cbase record types used by a given program. This type is for use when
declaring record variables. Variables of this type should be used for
functions such as cbgetr requiring a pointer to a record.
Following the record type definition is a list of macros used to
identify a field when calling cbase functions. The names of these
macros are constructed by taking the field names from the record type
definition and converting them to uppercase; this is why the field names
must be made unique. Each field name macro must be an integer
indicating the number of the field in the record type definition, starting
at zero. A macro for the total number of fields is also defined. This
macro is to be used for the fldc argument when calling cbcreate
and cbopen. Finally, the field definition array is declared. The
elements of the field definition list are of type cbfield_t, defined in
<cbase.h>.
Since more than one source file will in general need to include the
same header, the actual definition of the field definition array must be
placed in a separate file. This file is given a .i extension to indicate
that it contains data or code as opposed to a .h file containing only
definitions, and it is included by only one source file, normally the one
containing main. rolodeck.i is shown in figure 4.2.
The field definition array specifies, for each field in the record,
that field's offset in the record, size, type, some flags, and a filename
to be used if the field is to be a key. The offsetof macro should
be used to specify the offset. A macro sizeofm (size of member) is
defined in <blkio.h> to give the size of a member of a structure.
size_t sizeofm(struct_t, member);
The arguments to sizeofm are the same as for offsetof. The size
cannot be calculated from the difference between field offsets because
padding characters may be added between structure members to
maintain proper alignment (see KERN88). The field definition array is
to be used as the fldv argument to cbcreate and cbopen.
It should be noted that every record type should normally have at
least one unique key field. This field is used to uniquely identify
records. The physical record position is often used for this purpose,
but, as will be discussed later in this chapter, a unique key is required
for multiuser applications.
In addition to the data definition header, all programs using cbase
normally require that the following header files also be included.
#include <blkio.h>
#include <cbase.h>
#include <errno.h>
<blkio.h> is the header for the blkio library. It is included to
provide the definition of the NULL pointer and the declaration of the
function bexit. bexit is for use in place of exit in any program
using the blkio library for file access. It writes out any buffered data
for any open block file then calls exit (see bexit in the blkio
reference manual). <cbase.h> is the cbase header file containing all
the constant and type definitions and function declarations for using the
cbase library. <errno.h> contains the declaration of errno as well
as definitions of error code values; all cbase functions use errno for
error reporting.
4.2 Opening a cbase
The first step in accessing an existing cbase is to open it. Figure
4.3 shows the code from rolodeck.c to open the rolodeck cbase.
rolodeck is opened with a type argument of "r+" to allow both
reading and writing. The other arguments are the cbase name,
ROLODECK, the field count, RDFLDC, and the field definition list,
rdfldv, all defined in the data definition header file, rolodeck.h.
On error cbopen returns the NULL pointer. For this program there is
only one cbase, but most applications will require more.
If the named cbase does not exist, cbopen will fail and set errno
to ENOENT. In this example, if the rolodeck cbase does not exist, it is
created and the program continues as normal. Note that the cbase must
still be opened after it is created. In some cases a separate program is
written to create all the cbases required by an application; in this case
the main program would probably interpret ENOENT as an error.
/* open rolodeck cbase */
cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv);
if (cbp == NULL) {
if (errno != ENOENT) {
bexit(EXIT_FAILURE);
}
/* create rolodeck cbase */
printf("Rolodeck does not exist. Creating...\n");
if (cbcreate(ROLODECK, sizeof(rolodeck_t), RDFLDC, rdfldv)
== -1) {
bexit(EXIT_FAILURE);
}
cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv);
if (cbp == NULL) {
bexit(EXIT_FAILURE);
}
}
Figure 4.3. Opening a cbase
4.3 Locking a cbase
Before accessing an open cbase, it must first be locked. If data is
to be written to the cbase, it must be write locked, otherwise only a
read lock is required. A cbase can be read locked by more than one
process at the same time, and read locks are therefore also called
shared locks. A write lock, on the other hand, is an exclusive lock; a
write locked cbase can be neither read nor write locked by any other
process. Write locks are exclusive because, if one process tried to read
data while it was partially modified by another, the data would probably
be in an inconsistent state. Processes that will only read data, however,
can safely do so concurrently.
While a cbase is write locked, other processes needing to access
that cbase must wait until it is unlocked so that they can in turn lock it
themselves to complete their processing. While a cbase is read locked,
only processes needing to write must wait. Using a write lock when a
read lock would suffice will therefore delay other processes
unnecessarily. Locks of either type should be held for the shortest time
possible; a common mistake in writing multiuser applications is to
pause for use input while holding a lock, causing that lock to be held
indefinitely.
If an attempt is made to obtain a lock on a cbase, but is blocked
by a lock held by another process, cblock will fail and set errno to
EAGAIN. The call to cblock is therefore usually made in a loop with a
predefined maximum number of tries. It is convenient to place this in
a function configured for the application being developed. Figure 4.4
shows this function from rolodeck.c. It may also be suitable in
some instances to sleep for a short (possibly random) time between
attempts to lock.
#define LMAXTRIES (20) /* max lock tries */
/* rdlock: rolodeck lock */
int rdlock(cbase_t *cbp, int ltype)
{
for (int i = 0; i < LMAXTRIES; ++i) {
if (cblock(cbp, ltype) == -1) {
if (errno == EAGAIN) {
continue;
}
return -1;
} else {
errno = 0;
return 0;
}
}
errno = EAGAIN;
return -1;
}
Figure 4.4. Rolodeck Locking Function
There are also two lock types (CB_RDLKW and CB_WRLCKW)
which, if the requested lock is blocked, will wait until it can be
obtained. These are not usually used, however, because if the lock
does not become free in a reasonable time, the process waiting for the
lock will be hung.
For an applications where there will be only a single process, the
necessary locks can be set immediately after opening the cbases to be
accessed and left locked.
One critical concern when locking multiple cbases is the possibility
of deadlock. Deadlock is an extensive subject, and there are a number
of ways of dealing with it. Most texts on operating systems (see
CALI82) and database theory cover the subject in detail.
4.4 Accessing a cbase
The gross structure of the rolodeck program is a case statement
within a loop. At the start of the loop a user request is read and used
to select the action performed in the case statement. Each individual
action performed in the case statement illustrates the use of cbase to
perform a basic operation, e.g., inserting a record, deleting a record,
finding the next record, exporting data to a text file, etc. The operation
of finding the next record serves as a good general example. The code
for this from rolodeck.c is shown in figure 4.5.
One of the most important points to notice in the example code is
that a unique key (the contact name, here) is used to relocate the
current record when a cbase is locked. This is because, when a cbase
is unlocked, it may be modified by another process. A record at a
given location may be deleted, and the empty slot possibly reused for a
new record. Because of this, cbsetrpos cannot be used with a
record position obtained during a previously held lock. As mentioned
in the section 4.3, applications that do not involve multiple processes
accessing the same data can simply lock a cbase and leave it locked
rather than locking the cbase immediately prior to each transaction.
Another central point is the use of multiple keys. In the rolodeck
program, both the contact and the company names are keys. A variable
sf is used in rolodeck.c to identify the current sort field, which
can be changed interactively. Before using the cbkeynext function,
the appropriate key cursor must first be positioned. cbkeysrch
positions only the key being searched, here being the unique key. If
the next card is to be found using the sort order of a different key,
cbkeyalign must first be used to align that key cursor with the
current record.
case REQ_NEXT_CARD: /* next card */
rdlock(cbp, CB_RDLCK);
if (cbreccnt(cbp) == 0) {
printf("The rolodeck is empty.\n\n");
rdlock(cbp, CB_UNLCK);
continue;
}
/* use unique key field to set rec cursor */
found=cbkeysrch(cbp,RD_CONTACT,rd.rd_contact);
if (sf != RD_CONTACT) {
/* align cursor of sort key */
cbkeyalign(cbp, sf);
}
if (found == 1) {
/* advance key (and rec) cursor 1 pos */
cbkeynext(cbp, sf);
}
if (cbrcursor(cbp) == NULL) {
printf("End of deck.\n\n");
rdlock(cbp, CB_UNLCK);
continue;
}
cbgetr(cbp, &rd);
rdlock(cbp, CB_UNLCK);
break; /* case REQ_NEXT_CARD: */
Figure 4.5. Next Rolodeck Record
4.5 Closing a cbase
When a program is through accessing a cbase, the cbase should be
closed. Figure 4.6 shows this code from rolodeck.c.
/* close cbase */
if (cbclose(cbp) == -1) {
bexit(EXIT_FAILURE);
}
Figure 4.6. Closing a cbase
A cbase is automatically unlocked when it is closed. A cbase is
normally, but not necessarily, opened and closed only once in a
program.
4.6 Manipulating Exported Data
Exported data often requires some processing before it is used.
For instance, consider modifying an application to add a new field to an
existing cbase. If there is a considerable amount of data already stored,
it would be desirable to use the import/export functions rather than
manually entering the data again. Another common example is moving
data between cbase and another database system, which may use
different field and record separators for exported data.
An ideal tool for processing records in text files is awk. Figure
4.7 shows an awk program for inserting a new field at position two in
all the records in a text file (note that awk field numbering starts at
one, not zero).
The predefined variables FS and OFS are used to set the input and
output field separators, respectively. The predefined variables RS and
ORS are used to set the input and output record separators, respectively.
Setting these variables appropriately is all that is necessary to convert
between text file formats using different field and record separators.
The awk program in figure 4.8 converts text files exported from a
database using the tab character as a field separator for import by
cbase.
BEGIN {
FS = "|"; # set input and output field
OFS = FS; # and record separators
RS = "\n";
ORS = RS;
NEWFIELD = 2; # field to insert
}
# insfld: insert field n of current record
function insfld(n)
{
if (n < 1 || n > NF + 1) {
return -1;
}
for (i = NF; i >= n; --i) {
$(i + 1) = $i;
}
$n = "";
return 0;
}
{
if (insfld(NEWFIELD) == -1) {
printf "Error inserting new 2nd fld.\n";
exit 1;
}
print $0;
}
END {
exit 0;
}
Figure 4.7. awk Program to Insert Field
BEGIN {
FS = "\t"; # set input and output field
OFS = "|"; # and record separators
RS = "\n";
ORS = RS;
}
{
print $0
}
END {
exit 0;
}
Figure 4.8. awk Program to Change Field Separator
Appendix A: Installation Instructions
The cbase library is distributed in MS-DOS format on either two
360K 5.25" diskettes or one 720K 3.5" diskette. On the distribution
diskettes is a directory containing the files for each library and for
rolodeck, the example program. There is also a directory for manx, the
utility used to extract an on-line copy of the reference manual for each
library, and a directory containing installation batch files for different
MS-DOS compilers.
blkio blkio library
btree btree library
lseq lseq library
cbase cbase library
manx manx utility
rolodeck example program
bats batch files for different compilers
If cbase is being installed on an MS-DOS system, the following
two commands will copy the contents of the distribution diskettes onto
the main drive.
> mkdir cbase
> xcopy a:\ cbase /s /v
An operating system besides MS-DOS will require either a facility to
read MS-DOS diskettes or access to an MS-DOS machine from which
files can be transferred by a serial link or network to the target
machine. If the transfer process does not automatically convert the text
files to the format of the target system, an additional conversion utility
may be necessary.
The second step in the installation is set a switch specifying the
host operating system. This switch is the HOST macro in the blkio file
blkio_.h. If the host operating system is MS-DOS, the MSDOSC
macro in the same file must also be set to specify the C compiler being
used.
If the compiler being used is not completely ANSI compatible,
some additional switches must also be set. These are a set of macros
located in the blkio header file blkio.h, each of which specifies a
particular ANSI feature. For instance, the macro AC_PROTO is used to
indicate function prototype support.
The remaining installation instructions for each library, which
should be performed in the directory containing the files for that library,
are given below. Before proceeding, the manx utility should be
compiled and placed in a directory in the path. After all the libraries
have been installed, the final step is to compile the example program;
the instructions for doing this are given in the example program's
readme file.
The MS-DOS installation batch files, install.bat, each take
two arguments. The first specifies the memory model, legal values for
which are s, m, c, l, and h; the library file is named libm.lib,
where lib would be the library name and m would correspond to the
memory model of the library. The second, if present, causes the
reference manual to be extracted from the source code into the file
lib.man, where lib would again be the library name. The main
batch file included with each library is written for Borland Turbo C.
Because there is so little uniformity among C compilers for MS-DOS,
modifications will be required for other compilers. Instructions for
making these straightforward modifications are given at the beginning of
each install.bat. Some batch files modified for other compilers
can be found in the bats directory. Also, if a make utility is
available, the UNIX makefiles may instead be adapted.
A1. The blkio Library
UNIX
1. Set the HOST macro in blkio_.h to UNIX.
2. Install the boolean header file.
$ su
# cp bool.h /usr/include
# ^d
3. Extract the on-line reference manual.
$ make man
4. Build the blkio library.
$ make blkio
5. Install the blkio library. This will copy the blkio header
file blkio.h to /usr/include and the blkio library archive
to /usr/lib.
$ su
# make install
# ^d
MS-DOS
1. Set the HOST macro in blkio_.h to MS_DOS.
2. Set the MSDOSC macro in blkio_.h to the C compiler being
used.
3. If necessary, modify install.bat for the C compiler being
used.
4. Extract the reference manual and build and install the blkio
library.
> install s x
Run again for each additional memory model desired.
A2. The btree Library
UNIX
1. Install the blkio library.
2. Extract the on-line reference manual.
$ make man
3. Build the btree library.
$ make btree
4. Install the btree library. This will copy btree.h to
/usr/include and the btree library archive to
/usr/lib.
$ su
# make install
# ^d
MS-DOS
1. Install the blkio library.
2. If necessary, modify install.bat for the C compiler
being used.
3. Install the btree library.
> install s x
Run again for each additional memory model desired.
A3. The lseq Library
UNIX
1. Install the blkio library.
2. Extract the on-line reference manual.
$ make man
3. Build the lseq library.
$ make lseq
4. Install the lseq library. This will copy lseq.h to
/usr/include and the lseq library archive to
/usr/lib.
$ su
# make install
# ^d
MS-DOS
1. Install the blkio library.
2. If necessary, modify install.bat for the C compiler
being used.
3. Install the lseq library.
> install
Run again for each additional memory model desired.
A4. The cbase library
UNIX
1. Install the btree and lseq libraries.
2. Extract the on-line reference manual.
$ make man
3. Build the cbase library.
$ make cbase
4. Install the cbase library. This will copy cbase.h to
/usr/include and the cbase library archive to
/usr/lib.
$ su
# make install
# ^d
MS-DOS
1. Install the btree and lseq libraries.
2. If necessary, modify install.bat for the C compiler
being used.
3. Install the cbase library.
> install s x
Run again for each additional memory model desired.
Appendix B: Defining New Data Types
cbase is designed to allow custom data types to be defined by the
user. Custom data types are completely integrated, becoming
indistinguishable from those predefined. A data type definition consists
of a macro used as the type name (e.g., t_string), and three
functions: a comparison function, an export function, and an import
function. The comparison function is the most important; it determines
the sort order for data of that type. The export function is used to
export data of the associated type to a text file, and the import function
to import data. Below are given step-by-step instructions for defining a
new cbase data type.
B1. The Type Name
For each cbase data type there is a corresponding type name by
which the user refers to that data type. The type names are defined in
cbase.h.
#define t_char (0) /* signed character */
...
#define t_binary (26) /* binary data */
Type names must be defined as integers, starting at zero and increasing
in steps of one. The type name for a new data type would be added at
the end of this list, and be defined as an integer one greater than the
last data type in the list. To avoid possible conflict with future
predefined types, user defined type names should not start with t_.
The prefix ut_ is recommended for user defined types.
#define ut_new (27) /* new data type */
B2. The Comparison Function
A data type is characterized primarily by its sort order. Each data
type is given a comparison function defining this sort order.
Comparison functions are of the form
int cmp(const void *p1, const void *p2, size_t n);
p1 and p2 are pointers to two data items to be compared, and n is the
size of the data items. The value returned must be less than, equal to,
or greater than zero if the data item pointed to by p1 is less than,
equal to, or greater than, respectively, that pointed to by p2. The C
standard library function memcmp would be a valid cbase comparison
function.
All cbase comparison functions are located in the file cbcmp.c.
For a new data type, a comparison function would be added in this file.
static int newcmp(const void *p1, const void *p2,
size_t n)
{
...
}
Comparison functions are made static because they are accessed by
cbase only through an array of function pointers, cbcmpv, also defined
in cbcmp.c. This array contains the comparison function for each
cbase data type. The integer value of the type name is used by cbase
as an index into this array, and so the comparison functions must be in
the same order as the type names. A pointer to the comparison
function for a new data type would be added at the end of this array.
/* cbase comparison function table */
cbcmp_t cbcmpv[] = {
charcmp,
...
binarycmp,
newcmp
};
B3. The Export and Import Functions
Each data type has an associated export function. This export
function takes a data item of the associated type and writes it to a file
in a text format. Export functions are of the form
int exp(FILE *fp, const void *p, size_t n);
p is a pointer to the data item of size n to be exported. The export
function converts the data item to a text format, then writes it to the
current position in file fp. Upon successful completion, a value of
zero is returned. Otherwise, a value of -1 is returned. See cbexport
in the cbase Programmer's Reference Manual for a description of the
format of exported data.
All cbase export functions are located in the file cbexp.c. For a
new data type, an export function would be added in this file.
static int newexp(FILE *fp, const void *p, size_t n)
{
...
}
Just like comparison functions, export functions are accessed by
cbase through an array. This array, cbexpv, is defined in cbexp.c.
A pointer to the export function for the new data type would be added
at the end of this array.
The import function reads a data item from a text file. Import
functions are of the form
int imp(FILE *fp, const void *p, size_t n);
The parameters and return value are the same as for the export
function. Import functions are located in cbimp.c. Pointers to the
import functions are stored in the array cbimpv.
B4. The Type Count
The macro CBTYPECNT is defined in cbase_.h as the number
of data types defined. It must be incremented by one for each new
data type added.
After completing these steps, rebuild the cbase library (see
Appendix A). The underlying libraries do not need to be rebuilt.
Appendix C: Porting to a New Operating System
The blkio library provides a means for portable access to
structured files just as the stdio library does for text files. blkio is thus
the only library requiring modification to port to a new operating
system. The steps necessary to perform this port are outlined below.
C1. The HOST Macro
In the blkio library's private header file blkio_.h, a macro is
defined for each supported operating system. When installing the blkio
library, the host operating system is selected by defining the HOST
macro using one of these system macros. When porting to a new
operating system, a new system macro must be defined in blkio_.h.
#define UNIX (1) /* UNIX */
#define MS_DOS (2) /* MS-DOS */
#define NEWOS (3) /* new os */
#define HOST NEWOS
In some instances it will be necessary to take into account
variations among the C compilers available for a system. For
MS-DOS, the macro MSDOSC, also defined in blkio_.h, is used to
select a compiler in the same way as the HOST macro is used to select
the operating system. To port to a new C compiler under MS-DOS, a
new compiler macro must be defined.
#define TURBOC (1) /* Turbo C */
#define MSC (2) /* Microsoft C */
#define NEWC (3) /* new C compiler */
#define MSDOSC NEWC
To port to multiple incompatible compilers under a new operating
system, a new compiler selection macro analogous to MSDOSC would
need to be defined.
C2. The File Descriptor Type
In most operating systems, an open file is accessed not by name,
but through some type of handle, usually called a file descriptor. File
descriptors are normally of type int, but this is not guaranteed. A
union is therefore defined (in blkio.h) for the file descriptor.
typedef union { /* fildes type */
char cfd; /* character fildes */
short sfd; /* short int fildes */
int ifd; /* int fildes */
} fd_t;
This type is used to define the fd member of the BLKFILE structure.
typedef struct { /* block file ctl struct */
fd_t fd; /* file descriptor */
...
} BLKFILE;
When modifying the code in subsequent sections, the appropriate
member of the union fd_t would be used to access a file descriptor.
If the file descriptor type for the new system is short, for instance,
the file descriptor for BLKFILE *bp would be accessed as bp->fd.sfd.
It will be necessary to add a member to the fd_t union if one of the
required type does not already exist.
C3. File Access System Calls
The bulk of the operating system specific code is related to the
system calls used to access the file system. These system calls perform
basic operations such as opening, reading, and writing a file, and are
conceptually the same on most systems. In fact, they can usually be
directly translated to a corresponding call on the new system.
All system calls accessing the file system are isolated in the file
buops.c (blkio unbuffered operations). The HOST macro is used to
separate sections of code used for different operating systems.
Similarly, compiler selection macros such as MSDOSC are used to
separate sections of code for different compilers under the same
operating system.
#if HOST == UNIX
/* code for UNIX */
.
.
#elif HOST == MS_DOS
/* code for MS-DOS */
#if MSDOSC == TURBOC
/* code for Turbo C */
.
.
#elif MSDOSC == MSC
/* code for Microsoft C */
.
.
#endif
#endif
When porting to a new operating system (or compiler), each of these
conditional compilations using HOST (or the relevant compiler selection
macro) must be located and an additional #elif added for the new
system (or compiler).
C4. File Locking System Calls
System calls are also used to perform file locking. All system
calls for file locking are located in the file lockb.c. This file must
be modified as was buops.c. If file locking will not be used on the
new system, lockb.c need not be altered.
References
CALI82 Calingaert, P. Operating System Elements. Englewood
Cliffs, NJ: Prentice Hall, 1982.
COME79 Comer, D. The Ubiquitous B-tree. ACM Computing
Surveys, June 1979.
FROS89 Frost, L. A Buffered I/O Library for Structured Files.
The C Users Journal, October 1989.
HORO76 Horowitz, E. and S. Sahni. Fundamentals of Data
Structures. Rockville, MD: Computer Science Press, 1976.
KERN88 Kernighan, B. and D. Ritchie. The C Programming
Language. Englewood Cliffs, NJ: Prentice Hall, 1988.
KNUT68 Knuth D. The Art of Computer Programming Volume 3 /
Sorting and Searching. Reading, MA: Addison-Wesley, 1968.
ULLM82 Ullman, J. Principles of Database Systems. Rockville,
MD: Computer Science Press, 1982.